A segment-based interpretation of HMM/ANN hybrids

نویسندگان

  • László Tóth
  • András Kocsor
چکیده

Here we seek to understand the similarities and differences between two speech recognition approaches, namely the HMM/ANN hybrid and the posterior-based segmental model. Both these techniques create local posterior probability estimates and combine these estimates into global posteriors – but they are built on somewhat different concepts and mathematical derivations. The HMM/ANN hybrid combines the local estimates via a formulation that is inherited from the generative HMM concept, while the components of the segment-based model correspond quite directly to the two subtasks of phonetic decoding: segmentation and classification. In this paper we attempt to identify the corresponding components of the segmental model within the hybrid model, with the intent of gaining an insight from this unusual point of view. As regards one of these components, the segment-based phone posteriors, we show that the independence-based product rule combination applied in the hybrid produces strongly biased estimates. As for the other component, the segmentation probability factor, we argue that it is present in the hybrid thanks to the bias of the product rule – that is, the product rule goes wrong in such a special way that it helps the model find the best segmentation of the input. To prove this assertion, we combine this bias with the posterior estimates obtained by averaging, and find that the resulting ‘averaging hybrid’ slightly outperforms the standard one on a phone recognition task and a word recognition task as well. Overall we conclude that the contribution of the product rule to the decoding process is just as important for the segmentation subtask as it is for the segment classification subtask. 2007 Elsevier Ltd. All rights reserved.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Explicit Duration Modelling in HMM/ANN Hybrids

In some languages like Finnish or Hungarian phone duration is a very important distinctive acoustic cue. The conventional HMM speech recognition framework, however, is known to poorly model the duration information. In this paper we compare different duration models within the framework of HMM/ANN hybrids. The tests are performed with two different hybrid models, the conventional one and the “a...

متن کامل

Speech enhancement based on hidden Markov model using sparse code shrinkage

This paper presents a new hidden Markov model-based (HMM-based) speech enhancement framework based on the independent component analysis (ICA). We propose analytical procedures for training clean speech and noise models by the Baum re-estimation algorithm and present a Maximum a posterior (MAP) estimator based on Laplace-Gaussian (for clean speech and noise respectively) combination in the HMM ...

متن کامل

A Review of Epidemic Forecasting Using Artificial Neural Networks

Background and aims: Since accurate forecasts help inform decisions for preventive health-careintervention and epidemic control, this goal can only be achieved by making use of appropriatetechniques and methodologies. As much as forecast precision is important, methods and modelselection procedures are critical to forecast precision. This study aimed at providing an overview o...

متن کامل

Hidden neural networks: a framework for HMM/NN hybrids

This paper presents a general framework for hybrids of Hidden Markov models (HMM) and neural networks (NN). In the new framework called Hidden Neural Networks (HNN) the usual HMM probability parameters are replaced by neural network outputs. To ensure a probabilistic interpretation the HNN is normalized globally as opposed to the local normalization enforced on parameters in standard HMMs. Furt...

متن کامل

Hybrid HMM/Neural Network based Speech Recognition in Loquendo ASR

This paper describes hybrid Hidden Markov Models / Artificial Neural Networks (HMM/ANN) models devoted to speech recognition, and in particular Loquendo HMM/ANN, that is the core of Loquendo ASR. While Hidden Markov Models (HMM) is a dominant approach in most state-of-the-art speaker-independent, continuous speech recognition systems (and commercial products), Artificial Neural Networks (ANN) a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Computer Speech & Language

دوره 21  شماره 

صفحات  -

تاریخ انتشار 2007